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Abstract 

This study uses citation analysis from two citation tracking databases, Google Scholar (GS) and ISI Web of 
Science, in order to test the correlation between them and examine the effect of the number of paper versions on 
citations. The data were retrieved from the Essential Science Indicators and Google Scholar for 101 highly cited 
papers from Malaysia in the field of engineering. An equation for estimating the citation in ISI based on Google 
scholar is offered. The results show a significant and positive relationship between both citation in Google 
Scholar and ISI Web of Science with the number of versions. This relationship is higher between versions and 
ISI citations (r = 0.395, p<0.01) than between versions and Google Scholar citations (r = 0.315, p<0.01). Free 
access to data provided by Google Scholar and the correlation to get ISI citation which is costly, allow more 
transparency in tenure reviews, funding agency and other science pohcy, to count citations and analyze scholars' 
performance more precisely. 

Keywords: bibliometrics, citation analysis, evaluations, equivalence, Google Scholar, High cited, ISI Web of 
Science, research tools, H-index 

1. Introduction 

Citation index as a type of Bibliomettics method fraces the references in a published article. It shows that how 
many times an article has been cited by other articles (Fooladi et al., 2013). Citations are applied to evaluate the 
academic performance and the importance of information contained in an article (Zhang, 2009). This feature 
helps researchers get a preliminary idea of the articles and research that make an impact in a field of interest. The 
avenues to evaluate citation tracking have greatly increased in the past years (Kear & Colbert-Lewis, 2011). 
Citation analysis was monopolized for decades by the system developed by Eugene Garfield at the Institute for 
Scientific Information (ISI) now owned by Thomson Renter Scientific (Bensman, 201 1). ISI Web of Science is a 
publication and citation database which covers all domains of science and social science for many years (Aghaei 
Chadegani et al., 2013). In 2004, two competitors emerged, Scopus and Google Scholar (Bakkalbasi, Bauer, 
Glover, & Wang, 2006). Google Inc. released the beta version of 'Google Scholar' (GS) 
(http://scholargoogle.com) in November 2004 (Pauly & Stergiou, 2005). These three tools, ISI from Thomson 
Reuters, Google Scholar (GS) from Google Inc. and Scopus from Elsevier are used by academics to frack their 
citation rates. Access to ISI Web of Science is subscription-based service while GS provides a free alternative to 
retrieve the citation counts. Therefore, the researchers need to estimate their citation in ISI by knowing the GS 
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citation counts. On the other hand, publishing a research paper in a scholarly journal is necessary but not 
sufficient for receiving citations in the future (Nader Ale Ebrahim, 2013). The paper should be visible to the 
relevant users and authors in order to get citations. The visibility of the paper is defined by the number of paper 
versions which are available in the Google Scholar database. The number of citations will be limited to the 
versions of the published article on the web. The literature has shown increased visibility by making research 
outputs available through open access repositories, wider access results and higher citation impact (Nader Ale 
Ebrahim et al., 2014; Amancio, Oliveira Jr, & da Fontoura Costa, 2012; Antelman, 2004; Ertiirk & §engul, 2012; 
Hardy, Oppenheim, Brody, & Hitchcock, 2005). A paper has greater chance of becoming highly cited whenever 
it has more visibility (Nader Ale Ebrahim et al., 2013; Egghe, Guns, & Rousseau, 2013). 

The objectives of this paper are two-fold. The first objective is to find the correlation between Google Scholar 
and ISI citation in the highly cited papers. The second objective is to find a relationship between the paper 
availability and the number of citations. 

2. Google Scholar & Web of Science Citations 

The citation facility of Google Scholar is a potential new tool for Bibhometrics (Kousha & Thelwall, 2007). 
Google Scholar, is a free-of-charge by the giant Google search engine, has been suggested as an alternative or 
complementary resource to the commercial citation databases like Web of Knowledge (ISI/Thomson) or Scopus 
(Elsevier) (Aguillo, 2011). Google Scholar provides Bibhometrics information on a wide range of scholarly 
journals, and other published material, such as peer-reviewed papers, theses, books, abstracts and articles, from 
academic publishers, professional societies, preprint repositories, universities and other scholarly organizations 
(Orduna-Malea & Delgado Lopez-Cozar, 2014). GS also introduced two new services in recent years: Google 
Scholar Author Citation Tracker in 2011 and Google Scholar Metrics for Publications in April 2012 (Jacso, 
2012). Perhaps some of these documents would not otherwise be indexed by search engines such as Google, so 
they would be "invisible" to web searchers, and clearly some would be similarly invisible to Web of Science 
users, since it is dominated by academic journals (Kousha & Thelwall, 2007). On the other hand, the Thomson 
Reuters/Institute for Scientific Information databases (ISI) or Web of Science database (actually there is 
ambiguity between different names of former ISI), include three databases: Science Citation Index/Science 
Citation Index Expanded (SCI/SCIE) (SCIE is the online version of SCI), Social Science Citation Index (SSC) 
and Arts and Humanities Citation Index (AHCI) (Larsen & von Ins, 2010). Since 1964 the Science Citation 
Index (SCI) has been a leading tool in indexing (Garfield, 1972). 

Few studies have been done to find a correlation between GS with WoS citations. Cabezas-Clavijo and 
Delgado-Lopez-Cozar (2013) found that the average h-index values in Google Scholar are almost 30% higher 
than those obtained in ISI Web of Science, and about 15% higher than those collected by Scopus. GS citation 
data differed greatly from the findings using citations from the fee-based databases such as ISI Web of Science 
(Bommann et al., 2009). Google Scholar overestimates the number of citable articles (in comparison with formal 
citation services such as Scopus and Thomson Reuters) because of the automated way it collects data, including 
'grey' literature such as theses (Hooper, 2012). The first objective of this study is to find the correlation between 
Google Scholar and ISI citation in the highly cited papers. 

3. Visibility and Citation Impact 

Nader Ale Ebrahim et al. (2014) based on a case study confirmed that the article visibility will greatly improve 
the citation impact. The journal visibility has an important influence on the journal citation impact (Yue & 
Wilson, 2004). Therefore, greater visibility caused higher citation impact (Zheng et al., 2012). In contrast, lack of 
visibility has condensed a significant citation impact (Rotich & Musakali, 2013). Nader Ale Ebrahim et al. (2013) 
by reviewing the relevant papers extracts 33 different ways for increasing the citations possibilities. The results 
show that the article visibility has tended to receive more download and citations. In order to improve the 
visibility of scholars' works and make them relevant on the academic scene, electronic publishing will be 
advisable. This provides the potential to readers to search and locate the articles at minimum time within one 
journal or across multiple journals. This includes publishing articles in journals that are reputable and listed in 
various databases and peer reviewed (Rotich & Musakali, 2013). Free online availability substantially increases 
a paper's impact (Lawrence, 2001a). Lawrence (2001a, 2001b) demonstrated a correlation between the 
likelihood of online availability of the full-text article and the total number of citations. He further showed that 
the relative citation counts for articles available online are on average 336% higher than those for articles not 
found online (Craig, Plume, McVeigh, Pringle, & Amin, 2007). 

However, there are limited resources to explain the relationship between the paper availability and the number of 
citations (Craig et al., 2007; Lawrence, 2001b; McCabe & Snyder, 2013; Solomon, Laakso, & Bjork, 2013). 
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None of them discussed about the relationship between the number of versions, and citation. The number of 
"versions" will be shown in any Google Scholar search result. Figure 1 shows 34 different versions of an article 
entitled "Virtual Teams: a Literature Review (Nader Ale Ebrahim, Ahmed, & Taha, 2009)" and number of 
citations. The second objective of this research is to find a relationship between the paper availabihty and the 
number of citations. 

Virtual Teams: a Literature Review. 

NA Ebrahim , S Ahmed , Z Taha - Australian Journal of Basic . . 2009 - search.ebscohost.com 
Abstract In the competitive market, virtual teams represent a growing response to the need 
for fasting time-to-market, iowz-cost and rapid solutions to complex organizational problems. 
Virtual teams enable organizations to pool the talents and expertise of employees and ... 
Cited by 95 Related articles All 34 versions Cite Saved 

Figure 1 . The number of "versions" in the Google Scholar search result 

4. Methodology 

Highly cited papers from Malaysia in the field of engineering were retrieved from the Essential Science 
Indicators (ESI) which is one the Web of Knowledge (WoK) databases. ESI provides access to a comprehensive 
compilation of scientists' perfomiance statistics and science trend data derived from WoK Thomson Reuters 
databases. Total citation counts and cites per paper are indicators of influence and impact of each paper. There is 
a threshold to select highly cited papers according to the baseline data in ESI. This threshold is different from 
one discipline to another one. ESI rankings are determined for the most cited authors, institutions, countries, and 
journals (The Thomson Corporation, 2013). The paper must be published within the last 10-year plus four-month 
period (January 1, 2003-April 30, 2013) and must be cited above threshold level, in order to be selected. 
Essential Science Indicators data used in this research have been updated as of July 1, 2013. 

Google Scholar which is a free online database was used for deriving the number of citations and versions of the 
ESI highly cited papers. The data from ESI was collected on 29 July 2013 and Google Scholar data was 
collected on 31 July 2013. The total numbers of 101 papers were hsted in ESI as highly cited papers from 
Malaysia in the field of engineering. The lists of 101 papers were retrieved from ESI database and then were 
exported to an Excel sheet. A search engine was developed to get the number of citations and versions from 
Google Scholar. This gadget assisted the present researchers to collect the data more preciously and faster than 
searching for the papers one by one. The Statistical Package for the Social Sciences (SPSS) was used for 
analyzing the data. The results are illusfrated in the following section. 

5. Results and Discussion 

The number of citations which were derived from Web of Knowledge platform hereafter are called ISI citation. 
To study the relationship among the number of citations in Google scholar and ISI and the number of versions, 
correlation coefficients were computed. 

Table 1 shows descriptive statistics of the variables. 



Table 1. Descriptive statistic of variables 





N 


Minimum 


Maximum 


Mean 


Std. Deviation 


Versions 


101 


2 


28 


5.62 


3.078 


Cited in Google Scholar 


101 


4 


348 


80.76 


71.718 


ISI citation 


101 


5 


189 


43.15 


36.076 



As both numbers of citations in Google scholar and ISI were distributed normally, Pearson correlation coefficient 
(r) was used and the results showed a very high positive and significant association (r = 0.932 , P<0.01) between 
the number of citations in Google scholar and ISI for the articles that were published during 2006 to 2012 from 
Malaysia in the field of engineering. To study the relationship between both citation and the number of versions. 
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Spearman Rho was used due to the non-normal distribution of the versions. The results showed a significant and 
positive relationship between both citations in Google Scholar and ISI with the number of versions. This 
relationship was higher between versions and ISI citations (r = 0.395, p<0.01) than between versions and Google 
Scholar citations (r = 0.3 15, p<0.01). Linear regression was also applied to predict the number of citations in ISI 
based on Google Scholar citations. The results showed a very high predictability (R2 = 0.836) for the linear 
model (see 



Figure 2) which was significant (F = 511.63, p<0.01). Therefore, the final equation for estimating the citation in 
ISI based on Google Scholar is: 

ISI Citation = 5.961 + 0.460 (Google Scholar citation) 

200 -I 




0 100 200 300 400 

Google Scholar Citation 



Figure 2. Scatter diagram between ISI citation and Google Scholar citation 

To study the effect of the number of versions on both citations in Google Scholar and ISI, simple linear 
regression was applied. The results indicated that the number of versions had a significant positive effect on 
citations in both databases (see Table 2 and Table 3). 

Table 2. Summary of regression analysis results 





R Square 


F 


P 


t 


P 


Model ' 


0.276 


39.12** 


0.532 


6.255 


<0.01 


Model" 


0.272 


38.316** 


0.528 


6.19 


<0.01 



Predictors: Versions 

a: Dependent Variable: Cited in Google Scholar, b: Dependent Variable: ISI citation 



Table 3. Descriptive statistics of variables - Year 



Year 


N 


Versions 


Cited in Google Scholar 


ISI citation 




Mean 


SD 


Mean 


SD 


Mean 


SD 


Before 2009 


20 


7.75 


5.25 


152.85 


103.741 


79.8 


46.6 


2009 


26 


6.08 


1.695 


101.19 


38.948 


56.96 


20.577 


2010 


18 


5.11 


2.193 


70.17 


50.097 


41.44 


26.86 
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2011 16 4.31 1.352 49.25 33.66 21.31 12.015 

2012 21 4.48 2.089 19.9 9.518 9.24 3.315 

A comparison between Google Scholar and ISI citation for highly cited papers from Malaysia in the field of 
engineering (see Figure 3) shows that the citation counts in Google Scholar are always higher than the number of 
citations in ISI. 



180 -| 
160 - 




befor2009 2009 2010 2011 2012 

Years 



■ Cited ill Google Scliolar ■ISI citation 



Figure 3. Comparison between Google Scholar and ISI citations 

6. Conclusion 

The number of publications and the number of citations in ISI Web of Science are used to measure the 
researchers' scientific performance and their research impact. However, these numbers are not freely available. 
Therefore, the offered equation can be used as a reference to convert the number of Google Scholar citations to 
ISI citations. On the other hand, the number of versions of both systems has a significant positive effect on the 
number of citations. This finding supports other researchers' (Amancio et al., 2012; Antelman, 2004; Egghe et al., 
2013; Ertiirk & §engul, 2012; Hardy et al., 2005) findings related to the paper visibility. The results of this study 
indicate that there is a strong correlation between the number of citations in Google Scholar and ISI Web of 
Science. Therefore, the researchers can increase the impact of their research by increasing the visibility of their 
research papers (or paper versions). Future study is needed to determine the relationship between citation counts 
on the other databases such as Microsoft Academic Research, Scopus, SiteSeer index and ISI by considering 
journal article and conference papers. 
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